Spatial Analyses
Chapter 12
As in previous weeks, for today’s lab you are to create a new R markdown document using the class lab report template. Within your .Rmd document, include all you code, resulting plots, and answers to questions below.
When you are done with your report, use knitr to convert it to .PDF format to submit on Canvas. It is important that you document each step of your workflow using comments and that you break up the sections of your analysis into SEPARATE code chunks.
Today’s investigation. In this chapter you will learn about Light Detection and Ranging (LiDAR) data. You will learn to use point cloud data and lidar rasters in R and explore using QGIS - a free, open-source GIS tool. You will also explore the concept of uncertainty surrounding lidar raster data (and remote sensing data in general).
Lidar is an active remote sensing technique that measures vegetation height. Learn more about discrete and full waveform LIDAR and how to use LIDAR data.
After completing this tutorial, you will be able to:
Lidar or Light Detection and Ranging is an active remote sensing system that can be used to measure vegetation height across wide areas.
Watch the videos below to better understand what lidar is and how a lidar system works.
The Story of Lidar Data
How Lidar Works
Scientists often need to characterize vegetation over large regions. We use tools that can estimate key characteristics over large areas because we don’t have the resources to measure each individual tree. These tools often use remote methods. Remote sensing means that we aren’t actually physically measuring things with our hands, we are using sensors which capture information about a landscape and record things that we can use to estimate conditions and characteristics.
To measure vegetation across large areas we need remote sensing methods
that can collect many measurements quickly using automated sensors.
These measurements can be used to estimate forest structure across
larger areas. Lidar (also sometimes referred to as active laser
scanning) is one remote sensing method that can be used to map structure
including vegetation height, density and other characteristics across a
region. Lidar directly measures the height and density of vegetation (as
well as buildings and other objects) on the ground making it an ideal
tool for scientists studying vegetation over large areas.
Lidar is an active remote sensing system. An active system means that the system itself generates energy - in this case light - to measure things on the ground. In a lidar system, light is emitted from a rapidly firing laser. This light travels to the ground and reflects off of things like buildings and tree branches. The reflected light energy then returns to the lidar sensor where it is recorded.
A lidar system measures the time it takes for emitted light to travel to the ground and back. That time is used to calculate distance traveled. Distance traveled is then converted to elevation. These measurements are made using the key components of a lidar system including a GPS that identifies the X,Y,Z location of the light energy and an Internal Measurement Unit (IMU) that provides the orientation of the plane in the sky.
Light energy is a collection of photons. As the photons that make up light move towards the ground, they hit objects such as branches on a tree. Some of the light reflects off of those objects and returns to the sensor. If the object is small and there are gaps surrounding it that allow light to pass through, some light continues down towards the ground. Because some photons reflect off of things like branches but others continue down towards the ground, multiple reflections may be recorded from one pulse of light.
The distribution of energy that returns to the sensor creates what we call a waveform. The amount of energy that returned to the lidar sensor is known as “intensity”. The areas where more photons or more light energy returns to the sensor create peaks in the distribution of energy. Theses peaks in the waveform often represent objects on the ground like a branch, a group of leaves or a building.
There are many different uses for lidar data:
Lidar data have also been used to derive information about vegetation structure including:
A waveform or distribution of light energy is what returns to the lidar sensor. This return may be recorded in two different ways:
A discrete return lidar system identifies individual (discrete) peaks and records a point at each peak location in the waveform curve. These discrete or individual points are called returns. A discrete system may record 1-4 (and sometimes more) returns from each laser pulse.
A full waveform lidar system records a distribution of returned light energy. Full waveform lidar data are thus more complex to process, however, they can often capture more information compared to discrete return lidar systems.
Whether they are collected as discrete points or full waveform, most often lidar data are available as discrete points. A collection of discrete return lidar points is known as a lidar point cloud.
The commonly used file format to store lidar point cloud data is the
.las format. The .laz format is a highly
compressed version of .las and is becoming more widely
used.
Lidar data attributes can vary, depending upon how the data were collected and processed. You can determine what attributes are available for each lidar point by looking at the metadata.
All lidar data points will have:
X,Y Location Information: determines the x,y coordinate location of the object that the lidar pulse (the light) reflected off of
Z (elevation values): represents the elevation of the object that the lidar pulse reflected off of
Most lidar data points will have:
Some lidar point cloud data will also be “classified”. Classification refers to tagging each point with the object that it reflected off of. So if a pulse reflects off of a tree branch, you would assign it to the class “vegetation.” If the pulse reflects off of the ground, you would assign it to the class “ground.” Classification of lidar point clouds is an additional processing step. Classification simply represents the type of object that the laser return reflected off of. So if the light energy reflected off of a tree, it might be classified as “vegetation”. And if it reflected off of the ground, it might be classified as “ground”.
Some lidar products will be classified as “ground/non-ground.” Some datasets will be further processed to determine which points reflected off of buildings and other infrastructure. Some lidar data will be classified according to the vegetation type.
A data product is the data that are DERIVED from an instrument, or information collected on the ground. For instance, you may go out in the field and measure the heights of trees at 20 plots then calculate an average height per plot. The average value is DERIVED from the individual measurements that you collected in the field.
When dealing with sensor data, the sensors often collect data in a format that needs to be processed in order to get usable values from it.
The point cloud is one of the commonly found lidar data products and is the “native” format for discrete return lidar data. In this lesson you will explore some point cloud data using the plas.io viewer.
In this activity, you will open a .las file, in the
plas.io free online lidar data viewer. You will then
explore some of the attributes associated with a lidar data point
cloud.
Remember that not all lidar data are created equally. Different lidar data may have different attributes. In this activity, you will look at data that contain both intensity values and a ground vs non ground classification.
About plas.io: Plasio is a project by Uday Verma and Howard Butler that implements point cloud rendering capability in a browser. Specifically, it provides a functional implementation of the ASPRS LAS format, and it can consume LASzip-compressed data using LASzip NaCl module.
Notice that the data, upon initial view, are colored in a black - white color scheme. These colors represent the data’s intensity values. Remember that the intensity value for each lidar point represents the amount of light energy that reflected off of an object and returned to the sensor. In this case, darker colors represent LESS light energy returned. Lighter colors represent MORE light returned.
Next, scroll down through the tools in plas.io. Look for the Intensity Scaling slider. The intensity scaling slider allows you to define the thresholds of light to dark intensity values displayed in the image (similar to stretching values in an image processing software or even in photoshop).
Drag the slider back and forth. Notice that you can brighten up the data using the slider.
In addition to intensity values, these lidar data also have a classification value. Lidar data classification values are numeric, ranging from 0-20 or higher. Some common classes include:
In this case, these data are classified as either ground, or non-ground. To view the points, colored by class:
Point clouds provide a lot of information, scientifically. However, they can be difficult to work with given the size of the data and tools that are available to handle large volumns of points. Lidar data products are often created and stored in a gridded or raster data format. The raster format can be easier for many people to work with and also is supported by many different commonly used software packages.
aster or “gridded” data are stored as a grid of values which are rendered on a map as pixels. Each pixel value represents an area on the Earth’s surface. A raster file is a composed of regular grid of cells, all of which are the same size. You’ve looked at and used rasters before if you’ve looked at photographs or imagery in a tool like Google Earth. However, the raster files that you will work with are different from photographs in that they are spatially referenced. Each pixel represents an area of land on the ground. That area is defined by the spatial resolution of the raster.
A few notes about rasters:
A raster dataset can have attributes associated with it as well. For instance in a lidar derived digital elevation model (DEM), each cell represents an elevation value for that location on the earth. In a lidar derived intensity image, each cell represents a lidar intensity value or the amount of light energy returned to and recorded by the sensor.
There are different ways to create a raster from lidar point clouds. Let’s look at one of the most basic ways to create a raster file points: gridding. When you grid raster data, you calculate a value for each pixel or cell in your raster dataset using the points that are spatially located within that cell. To do this:
A different approach is to interpolate the value for each cell. Interpolation considers the values of points outside of the cell in addition to points within the cell to calculate a value. Interpolation also often uses statistical operations (math) to calculate the cell value. Interpolation is useful because it can provide us with some ability to predict or calculate cell values in areas where there are no data (or no points). And to quantify the error associated with those predictions which is useful to know, if you are doing research. We won’t be using interpolation in this lab.
To work with raster data in R, you can use the
raster and sf packages.
# load libraries
library(raster)
library(sf)
library(ggplot2)
You use the raster(“path-to-raster-here”) function to open a raster dataset in R. Note that you use the plot() function to plot the data. The function argument main = “” adds a title to the plot.
# open raster data
lidar_dem <- raster(x = "BLDR_LeeHill/pre-flood/lidar/pre_DTM.tif")
# plot raster data
plot(lidar_dem,
main = "Digital Elevation Model - Pre 2013 Flood")
If you zoom in on a small section of the raster, you can see the individual pixels that make up the raster. Each pixel has one value associated with it. In this case that value represents the elevation of ground.
Note that you are using the xlim= argument to zoom in to on region of
the raster. You can use xlim and ylim to
define the x and y axis extents for any plot.
# zoom in to one region of the raster
plot(lidar_dem,
xlim = c(473000, 473030), # define the x limits
ylim = c(4434000, 4434030), # define y limits for the plot
main = "Lidar Raster - Zoomed into one small region")
Next, let’s discuss some of the important spatial attributes associated with raster data.
The coordinate reference system (CRS) of a spatial
object tells R where the raster is located in geographic space. It also
tells R what method should be used to “flatten” or project the raster in
geographic space.
You will learn CRS in more detail in next weeks class.
For this week, just remember that data from the same location but saved
in different coordinate references systems will not line up in any GIS
or other program. Thus, it’s important when working with spatial data in
a program like R to identify the coordinate reference
system applied to the data and retain it throughout data processing and
analysis.
You can view the CRS string associated with your
R object using the crs() method. You can
assign this string to an R object too.
# view resolution units
crs(lidar_dem)
## CRS arguments:
## +proj=utm +zone=13 +datum=WGS84 +units=m +no_defs +ellps=WGS84
## +towgs84=0,0,0
# assign crs to an object (class) to use for reprojection and other tasks
myCRS <- crs(lidar_dem)
myCRS
## CRS arguments:
## +proj=utm +zone=13 +datum=WGS84 +units=m +no_defs +ellps=WGS84
## +towgs84=0,0,0
The CRS string for our lidar_dem object
tells us that your data are in the UTM projection.
~The UTM zones across the continental United States. Source: Chrismurf, wikimedia.org.
The CRS format, returned by R, is in a
PROJ 4 format. This means that the projection information
is strung together as a series of text elements, each of which begins
with a + sign.
+proj=utm +zone=18 +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0
You’ll focus on the first few components of the CRS in this activity:
+proj=utm The projection of the dataset. Your data are
in Universal Transverse Mercator (UTM).+zone=18 The UTM projection divides up the
world into zones, this element tells you which zone the data is in.
Harvard Forest is in Zone 18.+datum=WGS84 The datum was used to define the center
point of the projection. Your raster uses the WGS84 datum.+units=m This is the horizontal units
that the data are in. Your units are meters.Important: You are working with lidar data which has a Z or vertical value as well. While the horizontal units often match the vertical units of a raster they don’t always! Be sure to check the metadata of your data to figure out the vertical units!
Spatial Extent
Great Work!
This lab activity was originally written by [https://www.earthdatascience.org/courses/earth-analytics/spatial-data-r/](https://wwmodified by Jenna Ekwealor.